Pairwise document similarity measure based on present term set
نویسندگان
چکیده
منابع مشابه
A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملLearning a concept-based document similarity measure
Document similarity measures are crucial components of many text-analysis tasks, including information retrieval, document classification, and document clustering. Conventional measures are brittle: They estimate the surface overlap between documents based on the words they mention and ignore deeper semantic connections. We propose a new measure that assesses similarity at both the lexical and ...
متن کاملOntology based Similarity Measure in Document Ranking
This paper presents a methodology for the ontology based semantic annotation of web pages with annotation weighting scheme that takes advantage of the different relevance of structured document fields. The retrieval model is based on the importance factors of the structural elements, which are used to re-rank the documents retrieval by the ontology based distance measure. The relevance concept ...
متن کاملInvestigating Measures for Pairwise Document Similarity
The need for a more effective similarity measure is growing as a result of the astonishing amount of information being placed online. Most existing similarity measures are defined by empirically derived formulas and cannot easily be extended to new applications. We present a pairwise document similarity measure based on Information Theory, and present corpus dependent and independent applicatio...
متن کاملA judgment set similarity measure based on prime implicants
Distances and scores are widely used to measure similarity between collections of information, such as preference profiles, belief sets, judgment sets, argument labelings, etc. Defining a function that quantifies the similarity between information sets of logically interrelated information is nontrivial, as witnessed by the shortage of such quantifiers in the literature. We propose a similarity...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Big Data
سال: 2018
ISSN: 2196-1115
DOI: 10.1186/s40537-018-0163-2